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REMARKS/ARGUMENTS 
Status of the claims 

Claim 33 has been amended for clarity. Accordingly, upon entry of this 
amendment, claims 9, 10, 24, 25, 32, 33, and 36-38 will be pending for examination. 
Reconsideration is respectfully requested in light of the following remarks. 

Claim rejections under 35 U.S.C. S 103(a) 

Claims 9, 10, 24, 25, 32, 33, 36, and 37 stand rejected under 35 U.S.C. § 103(a) as 
allegedly unpatentable over U.S. Patent No. 5,650,501 ("the 501 patent") in view of WO 
01/53312 ("Tang"). Applicants respectfully traverse. 

The Supreme Court has affirmed the analysis set forth in Graham for the 
determination of obviousness. See KSR International Co. v. Teleflex Inc., Ill S. Ct. 1727 
(2007). Furthermore, the Court preserved the teaching, suggestion, and motivation (TSM) test, 
stating "[t]here is no necessary inconsistency between the idea underlying the TSM test and the 
Graham analysis", noting that it is "important to identify a reason that would have prompted a 
person of ordinary skill in the relevant field to combine the elements in the way the claimed 
invention does." Id. at 1741; see also, USPTO memorandum on KSR v. Teleflex, dated May 3, 
2007. 

Thus, there continues to be a requirement that a motivation to combine the 
teachings must be explicitly and clearly stated by the Patent Office in making an obviousness 
rejection. The USPTO memorandum is in agreement, stating "[therefore in formulating a 
rejection under 35 U.S.C. § 103(a) based upon a combination of prior art elements, it remains 
necessary to identify the reason why a person of ordinary skill in the art would have combined 
the prior art elements in the manner claimed". See USPTO memorandum on KSR v. Teleflex, 
dated May 3, 2007. 

Moreover, as set forth in M.P.E.P. § 2143, "[t]o establish a prima facie case of 
obviousness, three basic criteria must be met: (1) there must be some suggestion or motivation, 
either in the references themselves or in the knowledge generally available to one of ordinary 
skill in the art, to modify the reference or to combine reference teachings; (2) there must be a 
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reasonable expectation of success; and (3) the prior art reference (or references when combined) 
must teach or suggest all the claim limitations." All three elements set forth above must be 
present in order to establish a prima facie case of obviousness. 

Thus, the Graham factors, including the use of objective evidence of secondary 
considerations to rebut a prima facie case of obviousness, as well as, a flexible use of the TSM 
test remains the framework to be followed for a determination of obviousness. 

A. The present rejection 

The Examiner has set forth a new basis for rejection based on the combination of 
the '501 patent and Tang. In making the present rejection, the Examiner states that "[t]he main 
point of this rejection is that the '501 patent . . . teaches that 47% sequence identity to the 
catalytic domain of a kinase polypeptide would be enough [to teach] that the protein in question 
would be a kinase. This teaching suggests that one of ordinary skill in the serine/threonine 
kinase art would recognize the SAK polypeptide of Tang, which has at least 77% sequence 
identity to the SAK kinase of the '501 patent, would have kinase activity". The Examiner then 
alleges that it would have been obvious for one of ordinary skill to replace the SAK polypeptide 
used in screens for modulators of cell proliferation as described in the '501 patent with SEQ ID 
NO: 2389 of Tang, which has 99.9% identity to SEQ ID NO: 2 of the presently claimed 
invention. See Office Action at 4. 

Applicants respectfully disagree with the basis for the Examiner's rejection for 
the reasons discussed below. 

B. 47% percent sequence identity in the catalytic domain of a putative protein 
kinase, in the absence of direct biochemical confirmation, is insufficient to demonstrate that a 
particular amino acid sequence encodes a protein kinase. 

As a preliminary matter, Applicants respectfully submit that the '501 patent does 
not specifically make the claim alleged by the Examiner. Specifically, the '501 patent states: 
"[t]he Drospohila polo and Saccharomyces cerevisiae CDC5 genes encode serine/threonine 
kinases which are highly related in sequence (47% identity in the catalytic domain) and in 
function". See column 1, lines 11-15 of the '501 patent. Thus, the '501 patent is merely stating 
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the exact percent identity between the Drosophila polo and S. cerevisiae CDC5 genes, and it is 
not making a generalized statement applicable to the sequences of all protein kinases. In fact, the 
interpretation that the Examiner has made of this sentence is based on impermissible hindsight, 
since the Examiner's interpretation is made with the full knowledge that both of the cited proteins 
had kinase activity and that they both had similar function. There is no suggestion in the 
sentence cited by the Examiner that 47% identity equals kinase activity or function. 

However, in any case, the person of ordinary skill would not accept sequence 
homology at a level of 47% by itself as being compelling evidence that a given amino acid 
sequence encodes a protein kinase, in the absence of an experimental demonstration of 
enzymatic activity. Instead, the person of ordinary skill would recognize that protein kinase 
activity and specificity requires a precise three dimensional configuration of amino acid residues 
at the enzyme's active site, which may lie within an overall protein sequence that may or may 
not be at least 47% identical to the sequence of another protein kinase. 

Furthermore, the skilled artisan would be aware that several kinases were known 
to lack catalytic activity, despite having homology to other protein kinase sequences. These 
proteins were postulated to function not as kinases, but rather as molecular scaffolds or as kinase 
substrates. See, Manning et al, Science, 298: 1912-1934 (2002) ("Manning"; provided as 
Appendix A) at page 1915. In fact, 50 "protein kinase" domain sequences have been proposed to 
be inactive in this manner. As described in the review article by Manning, "[t]hus, surprisingly, 
nearly 10% of all kinase domains appear to lack catalytic activity. However, these domains are 
otherwise well conserved and are likely to maintain the typical kinase domain fold. This 
suggests that this domain can have generalized noncatalytic functions". See Manning at page 
1915. The skilled artisan would also be aware that inactive protein kinase pseudogenes are 
present in the genomes of mice and humans. Around 100 protein kinase pseudogenes have been 
uncovered as a result of the complete sequencing of the human genome. See, Manning at page 
1916-1917. Consequently, the skilled artisan would be unlikely to accept at face value a claim 
that a particular protein sequence specified a protein kinase, based solely on sequence homology 
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alone, given the pitfalls associated with such an assumption based on the considerations 
discussed above. 

Accordingly, the ordinarily skilled person would require experimental 
confirmation of kinase activity before concluding that a particular amino acid sequence encoded 
a protein kinase. For this reason, the skilled artisan would not consider the '501 patent to be 
enabled for the teaching of a SAK polypeptide with protein kinase activity based solely on 
sequence comparisons with known protein kinases. The '501 patent merely discloses: a 
sequence with sequence similarity to kinases of the polo and CDC5 family, the expression 
profile of a SAK mRNA during mouse development on Northern blots and by in situ 
hybridization on mouse embryo sections, the ability of an antisense SAK nucleic acid to reduce 
colony formation in transfected cells in tissue culture, and prophetic descriptions of enzymatic 
assays, based on the hypothesis that the cloned cDNA sequence encodes a protein kinase. No 
direct evidence of kinase activity is disclosed in the '501 patent. In contrast, the inventors of the 
present invention were the first to demonstrate that the claimed SAK polypeptide sequence does 
in fact specify a bona fide protein kinase by demonstrating that the sequence when expressed in 
E. coli results in a protein with autophosphorylation activity. Because the '501 patent is not 
enabled with respect to a teaching that the disclosed SAK sequence encodes a protein kinase, this 
reference cannot be used to infer kinase activity upon SEQ ID NO: 2389 of Tang. Thus, the 
combination of the '501 patent and Tang does not teach "a SAK polypeptide having at least 95% 
sequence identity to SEQ ID NO: 2, the polypeptide having serine/threonine kinase activity . . . 
wherein the functional effect is determined by measuring kinase activity of the SAK 
polypeptide", as recited in claim 9. Because the combination of the '501 patent and Tang may 
not be used to teach each and every element of the claimed invention, Applicants respectfully 
request withdrawal of the rejection under 35 U.S.C. § 103(a). 
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C. Even if at least 47% identity in the catalytic domains is acce pted as defining a 
protein kinase, the SAK catalytic domain sequence in the '501 patent has lower than 47% 
identity when compared to known serine/threonine kinases. 

Assuming, arguendo, that around 50% identity in the catalytic domains is 
accepted as defining a protein kinase, the disclosure of the '501 patent indicates that the 
disclosed SAK catalytic domain has a lower level of identity when compared to known 
serine/threonine kinases. Figure 4 of the '501 patent shows a sequence alignment among the 
disclosed SAK polypeptide sequence and the catalytic domains of the serine/threonine kinases, 
polo, Snk, CDC5, and Plk. Using the BLAST (bl2seq) program available at the NCBI site, 
Applicants have provided sequence alignments between the SAK polypeptide and the sequences 
of the catalytic domains disclosed in the '501 patent in Appendix B. The alignments shown in 
Appendix B indicate that the SAK polypeptide sequence disclosed in the '501 patent only has 
40% identity to the catalytic domain of polo; 40% identity to the catalytic domain of Snk; 36% 
identity to the catalytic domain of CDC5; and 36% identity to the catalytic domain of Plk. All 
these values of percent identity are below the about 50% identity suggested by the Examiner as 
being indicative of a protein sequence being that of a protein kinase. Thus, under this standard, 
the '501 patent could be construed as teaching away from the disclosed SAK sequence being a 
protein kinase. Such a teaching away, would discourage the skilled artisan from inferring that 
SEQ ID NO: 2389 of Tang was a protein kinase, and thus, fail to provide a motivation to 
combine the cited references. 

D. Rejection of dependent claims 37 and 38 

Claims 37 and 38 stand rejected under 35 U.S.C. § 103(a) as allegedly 
unpatentable over the '501 patent in view of WO 01/53312 ("Tang"), further in view of U.S. 
Patent No. 5,589,356 ("the '356 patent"). Applicants respectfully traverse. 

In making this rejection, the Examiner has cited the '356 patent as teaching a 
circular peptide. See Office Action at page 5. As discussed above, the cited references, at a 
minimum, cannot be used to teach each and every element of independent claim 9, from which 
claims 37 and 39 depend. Furthermore, as explained above, the skilled artisan would lack a 
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motivation to combine the references under the rationale put forth by the Examiner. The '356 
patent fails to remedy these deficiencies as required to establish a prima facie case of 
obviousness with respect to base claim 9. Accordingly, Applicants respectfully request 
withdrawal of the rejection with respect to dependent claims 37 and 38. 

In conclusion, for the above reasons, Applicants respectfully submit that a prima 
facie case of obviousness cannot be made using the cited references. Accordingly, Applicants 
request withdrawal the rejection under 35 U.S.C. § 103(a). 

CONCLUSION 

In view of the foregoing, Applicants believe all claims now pending in this 
Application are in condition for allowance. The issuance of a formal Notice of Allowance at an 
early date is respectfully requested. 

If the Examiner believes a telephone conference would expedite prosecution of 
this application, please telephone the undersigned at 925-472-5000. 

Respectfully submitted, 

GeneH. Yee 
Reg. No. 57,471 

TOWNSEND and TOWNSEND and CREW LLP 

Two Embarcadero Center, Eighth Floor 

San Francisco, California 941 1 1-3834 

Tel: 925-472-5000 

Fax:415-576-0300 

Attachments 

GHY:lls 

61054385 v1 
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ERK2 pathways, which contributes to the 
increased proliferative rate of tumor cells. For 
this reason, inhibitors of the ERK pathways 
are entering clinical trials as potential anti- 
cancer agents. In differentiated cells, ERKs 
have different roles and are involved in re- 
sponses such as learning and memory in the 
central nervous system. 

JNK1, JNK2, and JNK3 

The INKs were isolated and characterized as 
stress-activated protein kinases on the basis of 
their activation in response to inhibition of pro- 
tein synthesis (8). The JNKs were then discov- 
ered to bind and phosphorylate the DNA bind- 
ing protein c-Jun and increase its transcriptional 
activity. c-Jun is a component of the AP-1 
transcription complex, which is an important 
regulator of gene expression. AP-1 contributes 
to the control of many cytokine genes and is 
activated in response to environmental stress, 
radiation, and growth factors — all stimuli that 
activate JNKs. Regulation of the JNK pathway 
is extremely complex and is influenced by 
many MKKKs. As depicted in the STKE JNK 
Pathway Connections Map, there are 13 
MKKKs that regulate the JNKs. This diversity 
of MKKKs allows a wide range of stimuli to 
activate this MAPK pathway. JNKs are impor- 
tant in controlling programmed cell death or 
apoptosis (9). The inhibition of JNKs enhances 
chemotherapy-induced inhibition of tumor cell 
growth, suggesting that JNKs may provide a 
molecular target for the treatment of cancer. 



JNK inhibitors have also shown promise in 
animal models for the treatment of rheumatoid 
arthritis (10). The pharmaceutical industry is 
bringing JNK inhibitors into clinical trials for 
both diseases. 

p38 Kinases 

There are four p38 kinases: ct, (3, y, and 8. The 
p38a enzyme is the most well characterized 
and is expressed in most cell types. The p38 
kinases were first defined in a screen for drugs 
inhibiting tumor necrosis factor a-mediated in- 
flammatory responses (11). The p38 MAPKs 
regulate the expression of many cytokines. p38 
is activated in immune cells by inflammatory 
cytokines and has an important role in activa- 
tion of the immune response. p38 MAPKs are 
activated by many other stimuli, including hor- 
mones, ligands for G protein-coupled recep- 
tors, and stresses such as osmotic shock and 
heat shock. Because the p38 MAPKs are key 
regulators of inflammatory cytokine expres- 
sion, they appear to be involved in human 
diseases such as asthma and autoimmunity. 

Recently, a major paradigm shift for 
MAPK regulation was developed for p38ct. 
The p38a enzyme is activated by the protein 
TAB 1 (12), but TAB1 is not a MKK. Rather, 
TAB1 appears to be an adaptor or scaffolding 
protein and has no known catalytic activity. 
This is the first demonstration that another 
mechanism exists for the regulation of 
MAPKs in addition to the MKKK-MKK- 
MAPK regulatory module. This important 



observation indicates that other adaptor pro- 
teins should be scrutinized for potential roles 
in regulating MAPK activity. 

The importance of MAPKs in controlling 
cellular responses to the environment and in 
regulating gene expression, cell growth, and 
apoptosis has made them a priority for re- 
search related to many human diseases. The 
ERK, JNK, and p38 pathways are all molec- 
ular targets for drug development, and inhib- 
itors of MAPKs will undoubtedly be one of 
the next group of drugs developed for the 
treatment of human disease (13). 
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The Protein Kinase Complement 
of the Human Genome 

G. Manning, 1 * D. B. Whyte, 1 R. Martinez. 1 T. Hunter, 2 
S. Sudarsanam 1 ' 3 

We have catalogued the protein kinase complement of the human genome (the 
"kinome") using public and proprietary genomic, complementary DNA, and 
expressed sequence tag (EST) sequences. This provides a starting point for 
comprehensive analysis of protein phosphorylation in normal and disease 
states, as well as a detailed view of the current state of human genome analysis 
through a focus on one large gene family. We identify 518 putative protein 
kinase genes, of which 71 have not previously been reported or described as 
kinases, and we extend or correct the protein sequences of 56 more kinases. 
New genes include members of well-studied families as well as previously 
unidentified families, some of which are conserved in model organisms. Clas- 
sification and comparison with model organism kinomes identified orthologous 
groups and highlighted expansions specific to human and other lineages. We 
also identified 106 protein kinase pseudogenes. Chromosomal mapping re- 
vealed several small dusters of kinase genes and revealed that 244 kinases map 
to disease loci or cancer amplicons. 

Ever since the discovery nearly 50 years ago been intense interest in the role of protein phos- 
that reverable phosphorylation regulates the ac- phorylation in regulating protein function. With 
tivity of glycogen phosphorylase, there has the advent of DNA cloning and sequencing in 
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the mid-1970s, it rapidly became clear that a 
large family of eukaryotic protein kinases ex- 
ists, and the burgeoning numbers of protein 
kinases led to the speculation that a vertebrate 
genome might encode as many as 1001 protein 
kinases (1). The near-completion of the human 
genome sequence now allows the identification 
of almost all human protein kinases. The total 
(518) is about half that predicted 15 years ago, 
but it is still a strikingly large number, consti- 
tuting about 1.7% of all human genes. 

Protein kinases mediate most of the signal 
transduction in eukaryotic cells; by modifica- 
tion of substrate activity, protein kinases also 
control many other cellular processes, includ- 
ing metabolism, transcription, cell cycle pro- 
gression, cytoskeletal rearrangement and cell 
movement, apoptosis, and differentiation. 
Protein phosphorylation also plays a critical 

'SUGEN Inc., 230 East Grand Avenue, South San 
Francisco, CA 94080, USA. 2 Salk Institute, 10010 
North Torrey Pines Road, La Jolla, CA 92037, USA. 
Genomics and Biotechnology, Pharmacia Corpora- 
tion, 230 East Grand Avenue, South San Francisco, CA 
94080, USA. 

"To whom correspondence should be addressed. E- 
mail: gerard-manning@isugen.com 
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development, in physiological responses and 
in homeostasis, and in the functioning of the 
nervous and immune systems. Protein ki- 
nases are among the largest families of genes 
in eukaryotes (2-6) and have been intensive- 
ly studied. As such, they made an attractive 
target for an initial in-depth analysis of the 
gene distribution in the draft human genome. 
Mutations and dysregulation of protein ki- 
nases play causal roles in human disease, 
affording the possibility of developing ago- 
nists and antagonists of these enzymes for use 
in disease therapy (7-9). A complete catalog 
of human protein kinases will aid in the 
discovery of human disease genes and in the 
development of therapeutics. 

Comprehensive Discovery of Protein 
Kinase Genes 

Most protein kinases belong to a single su- 
perfamily containing a eukaryotic protein ki- 
nase (ePK) catalytic domain. We set out to 
identify all sequenced human ePKs by 
searching every available human sequence 
source (public and Celera genomic databas- 
es, Incyte ESTs, in-house and GenBank 
cDNAs and ESTs) with a hidden Markov 
model (HMM) profile of the ePK domain 
(70). This profile is sensitive enough to detect 
short fragments of even very divergent ki- 
nases that have little similarity to any single 
known kinase. We extended these fragments 
to full-length gene predictions using a com- 
bination of EST and cDNA data, Genewise 
homology modeling, and Genscan ab initio 
gene prediction; more than 90% of the new 
and extended sequences were . verified by 
cDNA cloning. We also identified 13 atypical 
protein kinase (aPK) families. These contain 
proteins reported to have biochemical kinase 



1 during activity, but which lack sequence similarity 
to the ePK domain, and their close ho- 
mologs (10). Some aPKs have structural 
similarity to ePK domains (11). New aPKs 
were identified with the use of additional 
HMMs and Psi-Blast. 



published only as hypothetical proteins or are 
not described as kinases. Many more are 
annotated only by automatic methods, or are 
fragmentary sequences and have not been 
individually studied. Most new kinases come 
from new and little-studied families, as tar- 
geted cloning has previously identified most 
members of well-known families. However, 



We identified 478 human ePKs and 40 aPK 
genes (Table 1 and Fig. 1) (table SI). Of 
these 518 protein kinases, 24 are absent from 
the public Genpept database, and 47 more are 
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best studied kinase families. One new mem- 
ber of the cyclin-dependent kinase (CDK) 
family was found: CDK1 1 is a close paralog 
of CDK.8 (91% protein sequence identity for 
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Fig. 1. Dendrogram of 491 ePK domains from 478 genes. Major groups (Table 1) are labeled and 
colored. For group-specific and comparative genomic trees, see www.kinase.com/human/kinome. 
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most of their length), a kinase that interacts 
with cyclin C and RNA polymerase II (12). A 
CDK11 ortholog exists in mouse, but fly 
(Drosophila melanogastef), worm (Caeno- 
rhabditis elegans), and yeast (Saccharomyces 
cerevisiae) have only a single member of this 
CDK8/CDK11 family. The Nek (NimA- 
related kinase) family is also thought to have 
a role in the cell cycle; we discovered four 
new Neks to bring the human total to 1 1 Nek 
kinases.' Within the mitogen-activated protein 
kinase (MAPK) cascade, we found two new 
Stell/MAP3K (MAP kinase kinase kinase) 
and two new Ste20/MAP4K (MAP kinase 
kinase kinase kinase) genes, all of which have 
restricted expression that may explain their 
failure to be previously cloned. For instance, 
only 14 ESTs are known from MAP3K8, and 
all but one derive from testis, lung, or brain 
libraries, indicating that these new genes may 
have evolved to mediate specialized roles in 
selected tissues. 

Classification and Phytogeny of the 
Human Kinome 

To compare related kinases in human and 
model organisms and to gain insights into 
kinase function and evolution, we classified 
all kinases into a hierarchy of groups, fami- 
lies, and subfamilies. This extends the Hanks 
and Hunter (13) human kinase classification 
of five broad groups, 44 families, and 51 
subfamilies by adding four new groups, 90 
families, and 145 subfamilies (Table 1 and 
Fig. 1) (table SI). Kinases were classified 
primarily by sequence comparison of their 
catalytic domains (10), aided by knowledge 
of sequence similarity and domain structure 
outside of the catalytic domains, known bio- 
logical functions, and a similar classification 
of the yeast, worm, and fly kinomes (4). 

Of the four new groups, STE consists of 
MAPK cascade families (Ste7/MAP2K, Stell/ 
MAP3K, and Ste20/MAP4K). The CK1 group 
contains CK1, TTBK (tau tubulin kinase), and 
VRK (vaccinia-related kinase) families. TKL 
(tyrosine kinase-like) is a diverse group of fam- 
ilies that resemble both tyrosine and serine- 
threonine kinases. It consists of the MLK 
(mixed-lineage kinase), LISK (LLMK/TESK), 
IRAK [interleukin-1 0L-1) receptor-associated 
kinase], Raf, PJPK [receptor-interacting protein 
kinase (RIP)], and STRK (activin and TGF-0 
receptors) families. Members of the RGC (re- 
ceptor guanylate cyclase) group are also similar 
in domain sequence to tyrosine .kinases. 

Phylogenetic comparison of the human ki- 
nome with those of yeast, worm, and fly (4) 
confirms that most kinase families are snared 
among metazoans and defines classes that are 
expanded in each lineage. Of 189 subfamilies 
present in human, 51 are found in all four 
eukaryotic kinomes, and these presumably 
serve functions essential for the existence of a 
eukaryotic cell. An additional 93 subfamilies 



are present in human, fly, and worm, implying 
that these evolved to fulfill distinct functions in 
early metazoan evolution. Comparison with the 
draft mouse genome indicates that more than 
95% of human kinases have direct orthologs in 
mouse; additional orthologs may emerge as that 
genome sequence is completed. 

The functions of human kinases can be 
inferred from family members in model 
organisms. For instance, the BRSK (brain- 
selective kinase) family has two uncharacter- 
ized human members that are selectively ex- 
pressed in brain. They are orthologous to 
worm SAD-1, which has a role in presynaptic 
vesicle clustering (14), suggesting a con- 
served function. A highly conserved ascidian 
(chordate) homolog is also expressed in neu- 
ral tissue and is asymmetrically localized to 
the posterior end of the embryo, suggesting a 
second role in embryonic axis determination 
(15). Conversely, we identified four families 
with orthologs in human, fly, and worm 
where no functional data are available for any 
member. Their phylogenetic distribution 
hints at roles fundamental to metazoan biol- 
ogy of which we are still ignorant. 

The human genome has approximately twice 
as many kinases as those of fly or worm, after 
idiosyncratic worm-specific expansions are 
trimmed (4). Accordingly, most kinase families 
have twice as many human members as they 
have in worm or fly. However, the expansion is 
not uniform: 25 subfamilies— including CDK5, 
CDK9, and Erk7 — have just one member in 
each organism, indicating critical unduplicated 
functions. Conversely, substantial human ex- 
pansions occurred in several families, with the 
most striking example being Eph family recep- 
tor tyrosine kinases (RTKs), where there are 14 
genes in human and only 1 in fly and worm 
(Table 2). These expanded families function 
predominantly in processes mat are more ad- 
vanced in human, such as the nervous and im- 
mune systems, angiogenesis, and hemopoiesis, 
as well as functions that are less obviously 
enhanced, such as apoptosis, MAPK signaling, 
calmodulin-dependent signaling, and epidermal 
growth factor (EGF) signaling. 

Fourteen families are found only in hu- 
man. The Tie family of RTKs are expressed 
in endothelial cells and function in angiogen- 
esis, and the Axl RTKs (Axl, Mer, and Ty- 
ro3) function in both hemopoietic and neural 
tissues. The Trio and RIPK families have 
invertebrate homologs that lack kinase do- 
mains. They are involved in muscle function 
and apoptotic signaling via tumor necrosis 
factor (TNF), Fas, and NF-kB, respectively. 
Lmr, NKF3, NKF4, NKF5, and HUNK are 
novel families whose functions are largely 
unknown, and BCR, FAST, Gil, Hll, and 
DNAPK are atypical kinases. 

The human expansions of many of these 
families can be traced both to large duplica- 
tions of multigene loci ("paralogons") and to 



local tandem duplications of smaller loci of- 
ten containing just one gene. This supports 
recent findings that vertebrate genome com- 
plexity may derive from ancient large-scale 
duplications as well as a continuing series 
of smaller scale duplications (16-18). For 
instance, each of the four human epidermal 
growth factor receptors (EGFRs) maps 
close to one of the four HOX clusters, 
implying that the proposed double duplica- 
tion of that cluster early in vertebrate evo- 
lution created the EGFR family from a 
single ancestral EGFR gene (19). Similarly, 
the eight genes of the VEGFR and PDGFR 
(vascular endothelial growth factor and 
platelet-derived growth factor receptors) 
families map to three of the four paraHOX 
clusters, and they probably derive from 
duplications of the single ancestral paraHOX 
locus as well as local duplications within the 
paraHOX loci (table S3). The common an- 
cestry of PDGFR and VEGFR families is 
supported by the Drosophila kinome, which 
contains two genes whose sequences are in- 
termediate between those two families (4). 

We mapped all kinase genes to chromo- 
somal loci to look for origins of kinase ex- 
pansions and' to link kinases with known 
disease loci. The map was created using the 
Celera and public genome assemblies and 
literature references (table S2). Although the 
overall kinase distribution is similar in den- 
sity to that of other genes, many pairs of 
closely related genes from the same families 
map closer to each other than expected by 
chance, indicating that they may have arisen 
through local chromosomal duplications (ta- 
ble S3). Seven pairs are within 30 kb of each 
other, all in tandem orientation. Another six 
pairs are within 1 Mb of each other, and 15 
more within 10 Mb. In all, 66 genes map 
unusually near to close paralogs, indicating 
that at least 6% of kinases may have arisen by 
local duplications. Most of these genes are 
from families that are highly expanded in 
human compared with worm and fly, further 
supporting a recent origin. The multigene 
duplications are thought to have arisen most- 
ly during early vertebrate evolution, but some 
local duplications may also have happened at 
this time. For instance, the clustering of 
PDGFRB and CSF-1 receptor (c-fins) genes 
is conserved in pufferfish (20). 

Chromosomal Mapping and Disease 

The knowledge of the exact chromosomal lo- 
cations of genes afforded by the complete hu- 
man genome assemblies is increasingly valu- 
able in pinpointing candidate disease genes 
within loci that are associated with specific 
diseases. Comparison of the kinase chromo- 
somal map with known disease loci indicates 
that 164 kinases map to amplicons seen fre- 
quently in tumors (21) and 80 kinases map to 
loci implicated in other major diseases (table 
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S2). Although each locus covers many genes, 
these data provide entry points for studying 
both the function of these kinases and their 
potential as the causative principle of these 
diseases. The role of kinases as biological 
control points and their tractability as drug 
targets make them attractive targets for dis- 
ease therapy. 

Catalytically Inactive Kinases 

Several ePK domains are known to lack kinase 
activity, experimentally, and these have been 
postulated to act as kinase substrates and scaf- 
folds for assembly of signaling complexes (22- 
24). Our sequence analysis shows that 50 hu- 
man kinase domains lack at least one of the 
conserved catalytic residues (Lys 30 , Asp 125 , 
and Asp 143 ) (table S5) and are predicted to be 
enzymatically inactive. Twenty-eight inactive 
kinases belong to families where all members 
are inactive in human, fly, and worm, and even 
in yeast. Thus, surprisingly, nearly 10% of all 
kinase domains appear to lack catalytic activity. 
However, these domains are otherwise well 
conserved and are likely to maintain the typical 
kinase domain fold. This suggests that this do- 
main can have generalized noncatalytic func- 



tions; it is also possible that they use a modified 
catalytic mechanism that does not require these 
residues. This has been shown for the Wnk 
family, where Lys 13 is thought to replace Lys 30 
in adenosine triphosphate (ATP) binding (25). 

The 50 "inactive" kinase domains fall into 
three main categories. First are domains that 
may act as modulators of other catalytic do- 
mains. GCN2 and JAK (Janus kinase) family 
kinases have dual ePK domains, one of which 
is inactive and may regulate the active do- 
main (26). Similarly, the inactive ePK do- 
main of receptor guanylate cyclases (RGCs) 
is thought to regulate the activity of the 
neighboring guanylate cyclase domain, in a 
manner that is modulated by ATP binding 
and phosphorylation (27). 

Second are other kinases with high similar- 
ity to the canonical ePK domain profile. These 
include the Ras pathway scaffold proteins KSR 
(kinase suppressor of Ras) (23) and the previ- 
ously undescribed KSR2, titin, ELK (integrin- 
linked kinase), PSKH2 (protein serine kinase 
H2), and unpublished kinases from the STLK 
and Trbl families. The scaffold protein CASK 
(caldunVcalmodulin-dependent serine kinase) 
contains an inactive protein kinase domain and 



an inactive guanylate kinase domain, both of 
which act as protein-protein interaction do- 
mains (28, 29). This group also contains several 
RTKs where an inactive kinase may dimerize 
with and act as a substrate of another RTK: 
Ryk, CCK4, the ephrin receptors EphAlO and 
EphB6,andErbB3(24). 

Third is a group whose members have 
very weak similarities to the kinase domain 
profile, and may have quite divergent func- 
tions. Of 37 "weak" kinase domains (whose 
kinase HMM E-value score is greater than 
le-30), 26 lack one or more catalytic resi- 
dues. Note, however, that other weakly scor- 
ing kinases have been shown experimentally 
to have catalytic activity, including Bubl (e- 
11 E value), VRK1 (e-10), PRPK (e-5), and 
haspin (e-3) (30-33). 

Other Functional Domains in Protein 
Kinases 

Most protein kinases act in a network of 
kinases and other signaling effectors, and are 
modulated by autophosphorylation and phos- 
phorylation by other kinases. Other domains 
within these proteins regulate kinase activity, 
link to other signaling modules, or subcellu- 
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larly localize the protein. We identified 83 
additional types of domain present in 258 of 
the 518 kinases, using profiles from the Pfam 
HMM collection (Table 3). In general, mem- 
bers of the same kinase family have the same 
domain structure, but some domain shuffling 
is seen, where individual members of fami- 
lies have gained or lost a domain and so may 
have altered function. For instance, the death 
domain is found in all four IRAK kinases as 
well as in single members of the DAPK and 
RIPK families. 

The most common domains mediate inter- 
actions with other signaling proteins: 24 kinases 
contain Src homology 2 (SH2) domains that 
bind to phosphotyrosine residues; other domains 
link to small guanosine triphosphatase (GTPase) 
signaling (38 kinases with RhoGEF, RhoGAP, 
RBD, PBD, RGS, CNH, HR1, or TBC do- 
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mains), lipid signaling (42 kinases with 
DAGJ>E, C2, PX, or PH domains), and calci- 
um signaling (28 kinases with CaM, IQ, or 
OPR/PB1 domains); target the protein to the 
cytoskeleton (seven kinases with spectrin, cofi- 
lin, myosin head, or FCH domains); or mediate 
interactions with other proteins (46 kinases: 
Death, SID, SAM, DM, or ankyrin domains) or 
RNA (three kinases with RRM, DSRM, and 
putative RNA binding Tudor domains). Most of 
the domains found in new or extended sequenc- 
es are the same as those already seen in other 
family members, but some unpredicted domains 
are found, such as the previously unpublished 
leucine-rich repeat kinase (LRRK) family, con- 
taining arrays of leucine-rich repeats, as well as 
armadillo and ankyrin repeats. 

Most of the 58 RTKs, 12 receptor serine- 
threonine kinases, and five receptor guanylate 



cyclases also have recognizable ligand-binding 
and other extracellular domains, along with 
clear signal peptides and transmembrahe re- 
gions. Several nonreceptor tyrosine kinases are 
also targeted to the membrane by lipidation or 
protein-protein interactions. Three kinases are 
targeted to the endoplasmic reticulum, five or 
six are likely to be mitochondrial, and most of 
the rest are thought to be cytoplasmic, nuclear, 
or both. 

Two hundred and sixty kinases contain no 
additional Pfam domains. Many are small 
proteins containing little more than an ePK 
domain and may be controlled by additional 
regulatory subunits, such as cyclins, which 
control CDK activity. Others contain con- 
served sequences that have not yet been clas- 
sified as domains and whose functions are 
unknown. 

Thirteen kinases have dual ePK domains, 
in which both domains appear to be active 
[six ribosomal S6 kinase (RSK) family ki- 
nases and two Trio family kinases] or the 
second domain is inactive (the four JAK fam- 
ily kinases and GCN2). The two RSK do- 
mains are involved in a kinase relay: Erk 
phosphorylates and activates the CAMK- 
group domain of RSK2, leading to autophos- 
phorylation on a linker region that then al- 
lows PDK1 to phosphorylate and activate the 
second AGC-group kinase domain (34). 

Kinase Pseudogenes 

The genome also contains many nonfunctional 
copies of kinase genes that are not expressed or 
encode degenerate, truncated proteins. These ki- 
nase pseudogenes are derived mostly from ret- 
roviral transposition and genomic duplications. 
Pseudogenes can confuse gene predictions, 
cross-hybridize with probes for functional 
genes, and contribute to disease by homologous 
recombination with their parental genes (35, 
36). We identified 106 pseudogenes containing 
similarity to the ePK domain or to an aPK (table 
S4); several other pseudogene fragments that 
lack a kinase domain were found but are not 
included here. All but two pseudogenes 
have open reading frames (ORFs) interrupt- 
ed by stop codons or frameshifts, which 
were verified by multiple independent se- 
quence sources. These ORFs typically have 
high protein sequence similarity to a func- 
tional ("parent") kinase; most are partial 
gene fragments. The two putative pseudo- 
genes with complete ORFs (CK2a-rs and 
STLK6-rs) lack introris and obvious pro- 
moters, are . absent from EST databases, 
have >98.5% DNA sequence identity to 
their parents, and contain remnants of 
polyA tails in their genomic sequences. 
They are probably young processed pseudo- 
genes whose sequences have not yet 
diverged. 

Seventy-five kinase pseudogenes lack in- 
trons. Some are duplications of intronless genes 
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or of single exons of larger genes, but most 
appear to derive from viral retrotransposition of 
a processed transcript Additionally, some in- 
tron-containing pseudogenes such as AurAps2 
contain some parental introns but lack others, 
and may result from retrotransposition of a 
partially spliced transcript. 

Twenty-nine kinase pseudogenes contain 
clear introns and probably arose by genomic 
duplication. In some cases, these are part of a 
large duplicon (2, 5) containing multiple du- 
plicated genes. Such cases include two p70 
ribosomal protein S6 kinase (p70S6K) pseu- 
dogenes, which appear to arise from intrach- 
romosomal duplications of the p70S6K locus. 
These duplications are 20 kb and 70 kb in 
length, and are 90 to 95% identical in DNA 
sequence to the original locus. 

A few pseudogenes have no obvious hu- 
man parent but have functional orthologs in 
rodents and probably indicate the decay of 
previously functional genes. They include the 
polo-like kinase SGK384ps, whose mouse 
ortholog is intact, and the human orthologs of 
rat guanylate cyclases CGD and KSGC. 

Although pseudogenes appear to be evo- 
lutionary relicts, some may have some resid- 
ual or cryptic function. Many pseudogenes 
are transcribed: 26 kinase pseudogenes are 
seen in cDNA and EST databases (table S4), 
some represented by as many as 50 ESTs. 

The prevalence of pseudogenes varies great- 
ly between kinase families (Table 1) (table S4). 
The MARK (microtubule afEnity-regulating ki- 
nase) family kinases displays the largest ratio of 
pseudogenes to functional genes (28/4), fol- 
lowed by P 70S6K (4/1), Erk3. (4/1), phospho- 
rylase kinase -yl (3/1), and casein kinase la 
(3/1). Frequent copying of a gene by retroviral 
insertion might indicate a functional role for the 
gene in retroviral function, but no viral function 
or source for MARK genes is yet known. 

Comparison with Sequence Databases 

We compared our nonredundant set of cloned 
and predicted kinase protein sequences with the 
published predictions from Celera and public 
genome projects (2, 5) and with a recent release 
of the public GenPept database (10). Figure 2 
shows the extent to which the best match in each 
database agrees with our sequences. All three 
databases contain at least fragments of most 
kinases, but far fewer genes are in perfect agree- 
ment. In many cases the public sequences come 
from partial clones that lack the NH 2 - or 
C(X>H-termini (43 and 15 genes, respectively), 
often from large-scale sequencing projects that 
do not individually annotate sequences. In other 
cases, the public sequence has overextended the 
true start site where upstream stop codons are 
absent We used similarity to rodent orthologs to 
trim sequences to a strongly predicted transla- 
tional start site in nine cases. Other discrepan- 
cies come from sequencing errors, alternative 
splicing, and sequencing of partially spliced 



cDNAs. In all cases, our unique sequence is 
supported by strong sequence similarity to ho- 
mologs or by cDNA cloning. 

In some cases, our additional sequence 
greatly changes the predicted function of a 
gene, such as the addition of a predicted 
signal peptide to the Lmrl tyrosine kinase; 
the previously published form of this gene 
(AATYK) was based on a cDNA lacking this 
domain, which created a cytoplasmic protein 
(37). We also identified full-length forms of 
two related new genes, Lmr2 and Lmr3, 
which together form a new family of predict- 
ed receptor tyrosine kinases with vestigial 
extracellular regions. Their biological roles 
are currently under investigation. 

Gene predictions from the public genome 
project (Ensembl) and Celera differ from those 
we obtained largely as a result of misprediction 
of exon boundaries and splitting of single genes 
into multiple predicted genes. Ensembl incor- 
porates public sequence data from RefSeq and 
Swiss-Prot, giving perfect agreement with our 
sequences for many genes'. The distance be- 
tween the GenPept and Ensembl traces in Fig. 2 
indicates the extent of recent new sequence 



tains multiple sequences for most kinases, 
many of which are partial fragments or contain 
multiple sequencing errors. It also contains chi- 
meric genes such as the nonexistent zona pel- 
hicida kinase (38). The proliferation of different 
names for the same kinase adds to the problem 
of creating an accurate nonredundant list of 
kinases. Ensembl and Celera predictions in- 
clude several pseudogenes (36 and 29, respec- 
tively), and also annotate as kinases a number 
of genes that are homologous to noncatalytic 
regulatory subunits of protein kinase complexes 
or to kinases other than protein kinases. 

All 5 1 8 kinases are found in at least one of 
the expressed sequence databases (dbEST, 
Incyte, and GenBank cDNAs), indicating that 
all are genuine, transcribed genes. Many ki- 
nases are expressed in low amounts in a 
restricted distribution, so the presence of all 
kinases in EST or cDNA databases implies 
that these databases contain fragments of 
most human genes. 

Summary 

The sequencing of the human genome has pro- 
vided a starting point for the identification of 




Sequence difference (%) 
Fig. 2. Comparison of our kinase protein sequences with the best matches in Celera, Ensembl, and 
GenPept databases. Each point shows the number of genes for which the percentage difference 
between our sequence and the database is greater than the value indicated. Insert table indicates 
number of sequences where differences between our sequence and closest database match is >2%, 
>50%, or >95%. 



publication from large-scale cDNA sequencing 
projects and individual cloning driven by 
genomic data. The Celera predictions were en- 
tirely computational, and so have very few per- 
fect predictions. However, for genes not present 
in public databases, many Celera predictions 
agree better with our sequences than those from 
Ensembl (not shown). 

A comparison with "known" protein ki- 
nases encounters several problems with over- 
and under-classification of genes as kinases, as 
well as with partial sequences. GenPept con- 



most, if not all, human members of the eukary- 
otic protein kinase superfamily, and many atyp- 
ical kinases. We used the published human 
genome sequences, combined with other se- 
quence databases and directed cloning and se- 
quencing of individual genes to discover, ex- 
tend, or correct 125 kinase gene sequences, and 
define a nonredundant set of 5 1 8 human protein 
kinase genes. This set accounts for almost all 
human protein phosphorylation and collective- 
ly mediates most cellular signal transduction 
and many other processes. Comparative se- 
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quence analysis and mapping predict function 
and possible disease association for many 
kinases, and give clues to their evolutionary 
origin. Comprehensive kinome-scale ap- 
proaches are now feasible, including RNA 
and protein expression profiling, and high- 
throughput functional assays using constitu- 
tively active and dominant-negative kinase 
constructs. These will facilitate the study of 
the role of kinases in a wide range of biolog- 
ical processes, and the development of selec- 
tive inhibitors and activators for research and 
therapeutic purposes. 

This large and well-curated sequence set 
also casts a light on the current state of 
human genome analysis. All 518 genes are 
covered by some EST sequence, and -90% 
are present in gene predictions from the 
Celera and public genome databases, al- 
though those predictions are often fragmen- 
tary or inaccurate and are frequently misan- 
notated (39). 
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Blast 2 Sequences results APPENDIX B 

PubMed Entrez BLAST OMIM Taxonomy Structure 

BLAST 2 SEQUENCES RESULTS VERSION BLASTP 2.2.17 [Aug-26-2007] 

Matrix BLOSUM62 gap open: 11 gap extension: 1 

x_dropoff: 0 expect: 10.000C wordsize: 3 Filter |*| View option Standard 
Masking character option X for pro tein, n for nucleotide : '■ Masking color option Black ■ 
□ Show CDS translation | Align | 




Sequence 1: unnamed protein product 

Length = 273 (1 ..273) ^ ^ £ ^aiOhiO % . (o) 

Sequence 2: unnamed protein product M f£ v S 

Length = 271 (1 ..271) 
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NOTE:Bitscore and expect value are calculated based on the size of the nr database. 



Score = 226 bits (576) , Expect = 2e-57 

Identities = 106/260 (40%), Positives = 163/260 (62%), Gaps = 1/260 (0%) 

Query 8 FKVGNLLGKGSFAGVYRAESIHTGLEVAIKMIDKKAMYKAGMVQRVQNEVKIHCQLKHPS 
+K GKG FA Y + T A K++ KK M K ++ E+ IH L HP+ 

Sbj Ct 7 YKRMRFFGKGGFAKCYEIIDVETDDVFAGKIVSKKLMIKHNQKEKTAQEITIHRSLNHPN 

Query 68 VljELYNYFEDlSrNYWLVIiEMCHNGEMNRYLKNRMKPFSEREARHFMHQIITGMLYLHSHG 

+++ +NYFED+ +Y+VLE+C M L R K +E E R++++QII G+ YLH + 
Sbj ct 67 IVKFHNYFEDSQNIYIVLELCKKRSMME-LHKRRKSITEFECRYYIYQIIQGVKYLHDNR 

Query 12 8 ILHRDLTLSNILLTRNMNIKIADFGLATQLNMPHEKHYTLCGTPNYISPEIATRSAHGLE 
I+HRDL L N+ L +++KI DFGLAT++ E+ TLCGT NYI+PEI T+ H E 
Sbj Ct 12 6 IIHRDLKLGNLFLNDLLHVKIGDFGLATRIEYEGERKKTLCGTANYIAPEILTKKGHSFE 

Query 18 8 SDIWSLGCMSYTLLIGRPPFDTDTVKNTLNKWLADYEMPAFLSREAQDLIHQLLRRNPA 

DIWS+GC+ YTLL+G+PPF+T T+K+T +K+ +Y +P++L + A D++ +L+ NP 
Sbjct 186 vDIWSIGCVMYTLLVGQPPFETKTLKDTYSKIKKCEYRVPSYLRKPAADMVIAMLQPNPE 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



Blast Result 



Page 2 of 2 



Query 248 DRLSLSSVLDHPFMSRNPSP 267 

R ++ +L+ F+ + P 
Sbjct 246 SRPAIGQLLNFEFLKGSKVP 265 



0.09 user sees. 0.03 sys. sees 0.12 total sees. 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



Blast Result 



Page 1 of 2 



Blast 2 Sequences results 

PubMed Entrez BLAST OMIM Taxonomy Structure 

BLAST 2 SEQUENCES RESULTS VERSION BLASTP 2.2.17 [Aug-26-2007] 

Matrix BLOSUM62 • g a p open:; 1 1 ! gap extension: 1 

x_dropoff: 0 expect: 10.000C wordsize: 3 Filter B View option j Standard 
Masking character option X for pro tein, n for nucleotide Masking color option B 'ack 
□ Show CDS translation | Ali 9" | 



Sequence 1: unnamed protein product 
Length - 273 (1 .. 273) 

Sequence 2: unnamed protein product 
Length = 271 (1 ..271) 




e/nc ice* /o) 



NOTE:Bitscore and expect value are calculated based on the size of the nr database. 



Score = 213 bits (541), Expect = 2e-53 

Identities = 105/257 (40%), Positives = 151/257 (58%), Gaps = 2/257 (0%) 

GNLLGKGSFAGVYRAESIHTGLEVAIKMIDKKAMYKAGMVQRVQNEVKIHCQLKHPS VLE 7 0 
G +LGKG FA Y + A K+I + K +++ E ++H L H V++ 

GK VLGKGGFAKCYEMTDLTNNKVYAAKI I PHSRVAKPHQREKIDKE - ELHRLLHHKH WQ 6 9 

LYNYFEDNNYVYLVLEMCHNGEMNRYLKNRMKPFSEREARHFMHQIITGMLYLHSHGILH 13 0 

Y+YFED +Y++LE C M LK R K +E E R+++ QI++G+ YLH ILH 

FYHYFEDKENIYILLEYCSRRSMAHILKAR-KVLTEPEVRYYLRQIVSGLKYLHEQEILH 128 



Query 


11 


Sbjct 


11 


Query 


71 


Sbjct 


70 


Query 


131 


Sbjct 


129 


Query 


191 


Sbjct 


189 



M +K+ DFGLA +L 



T+CGTPNY+SPE+ 



W+LGC+ YT+L+GRPPF+T +K T + A Y MP+ L A+ LI +L +NP DR 
WALGCVMYTMLLGRPPFETTNLKETYRCIREARYTMPSSLLAPAKHLIASMLSKNPEDRP 24 8 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



Blast Result 



Page 2 of 2 



Query 251 SLSSVLDHPFMSRNPSP 267 

SL ++ H F + +P 
Sbjct 249 SLDDIIRHDFFLQGFTP 265 



CPU time: 0.10 user sees. 0.05 sys. sees 0.15 total sees. 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



Blast Result 



Page 1 of 2 



Blast 2 Sequences results 

PubMed Entrez BLAST OMIM Taxonomy Structur 

BLAST 2 SEQUENCES RESULTS VERSION BLASTP 2.2.17 [Aug-26-2007] 

Matrix BLOSUM62 gap open: 1 1 gap extension: 1 

x_dropoff: 0 expect: 10.000C wordsize: 3 Filter B View option Standard 
Masking character option X for pro tein, n for nucleotide Masking color option Black 
□ Show CDS translation I Align [ 



Sequence 1: unnamed protein product 
Length = 273 (1 .. 273) 

Sequence 2: unnamed protein product 
Length = 275 (1 .. 275) 




NOTE:Bitscore and expect value are calculated based on the size of the nr database. 



Score = 189 bits (480) , Expect = 2e-46 

Identities = 101/262 (38%), Positives = 153/262 (58%), Gaps = 6/262 (2%) 

Query 1 IGERIEDFKVGNLLGKGSFAGVYRAESIHTGLEVAIKMIDKKAMYKAGMVQRVQNEVKIH 60 

I R +D+ G+ LG+G FA ++ + +G A K + K ++ +++ +E++IH 

Sbjct 1 IKTRGKDYHRGHFLGEGGFARCFQIKD-DSGEIFAAKTVAKASIKSEKTRKKLLSEIQIH 59 

Query 61 CQLKHPSVLELYNYFEDNNYVYLVLEMCHNGEMNRYLKNRMKPFSEREARHFMHQIITGM 120 

+ HP++++ + FED++ VY++LE+C NG + LK R K +E E R F QI + 
Sbjct 60 KSMSHPNIVQFIDCFEDDSNVYILLEICPNGSLMELLKRR-KVLTEPEVRFFTTQICGAI 118 

Query 121 LYLHSHGILHRDLTLSNILLTRNMNIKIADFGLATQLNMPHEKHYTLCGTPNYISPEI- - 178 

Y+HS ++HRDL L NI N N+KI DFGLA L E+ YT+CGTPNYI+PE+ 
Sbjct 119 KYMHSRRVIHRDLKLGNIFFDSNYNLKIGDFGLAAVLANESERKYTICGTPNYIAPEVLM 178 

Query 179 ATRSAHGLESDIWSLGCMSYTLLIGRPPFDTDTVKNTLNKWLADYEMPAF- -LSREAQD 236 

SHE DIWSLG M Y LLIG+PPF V ++ D+ P +S E + 

Sbjct 179 GKHSGHSFEVDIWSLGVMLYALLIGKPPFQARDVNTIYERIKCRDFSFPRDKPISDEGKI 238 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



Blast Result 



Page 2 of 2 



Query 23 7 LIHQLLRRNPADRLSLSSVLDH 258 

LI +L +P +R SL+ ++D+ 
Sbjct 239 LIRDILSLDPIERPSLTEIMDY 260 



0.08 user sees. 0.03 sys. sees 0.11 total sees. 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



Blast Result 



Page 1 of 2 



Blast 2 Sequences results 

PubMed Entrez BLAST OMIM Taxonomy Structure 

BLAST 2 SEQUENCES RESULTS VERSION BLASTP 2.2.17 [Aug-26-2007] 

Matrix BLOSUM62 gap open:] 1 1 gap extension: 1 

x_dropoff: 0 expect: 10.000C; wordsize: 3 Filter B View option j Standard 
Masking character option X for pro tein, n for nucleotide ; Masking color option B| ack • 
□ Show CDS translation I Align [ 




Sequence 1 : unnamed protein product 
Length = 273 (1 ..273) 

Sequence 2: unnamed protein product 
Length = 272 (1 .. 272) 



Pi k c$eQ id uo \ n) 




NOTE:Bitscore and expect value are calculated based on the size of the nr database. 



Score = 204 bits (519), Expect = 7e-51 

Identities = 95/257 (36%), Positives = 155/257 (60%), Gaps = 1/257 (0%) 

Query 11 GNLLGKGSFAGVYRAESIHTGLEVAIKMIDKKAMYKAGMVQRVQNEVKIHCQLKHPSVLE 70 

G LGKG FA + T A K++ K + K + + + E+ IH L H V+ 

Sbjct 11 GRFLGKGGFAKCFEISDADTKEVFAGKIVPKSLLLKPHQKEKMSMEISIHRSLAHQHWG 7 0 

Query 71 LYNYFEDNNYVYLVLEMCHNGEMNRYLKNRMKPFSEREARHFMHQIITGMLYLHSHGILH 13 0 

+++FED+++V++VLE+C + L R K +E EAR+++ QI+ G YLH + ++H 
Sbjct 71 FHDFFEDSDFVFWLELCRRRSLLE-LHKRRKALTEPEARYYLRQIVLGCQYLHRNQVIH 129 

Query 131 RDLTLSNILLTRNMNIKIADFGLATQLNMPHEKHYTLCGTPNYISPEIATRSAHGLESDI 190 

RDL L N+ L ++ +KI DFGLAT++ E+ TLCGTPNYI+PE+ ++ H E D+ 
Sbjct 130 RDLKLGNLFLNEDLEVKIGDFGLATKVEYEGERKKTLCGTPNYIAPEVLSKKGHSFEVDV 189 

Query 191 WSLGCMSYTLLIGRPPFDTDTVKNTLNKWLADYEMPAFLSREAQDLIHQLLRRNPADRL 250 

WS+GC+ YTLL+G+PPF+T +K T ++ +Y +P ++ A LI ++L+ +P R 
Sbjct 190 WSIGCIMYTLLVGKPPFETSCLKETYLRIKKNEYSIPKHINPVAASLIQKMLQTDPTARP 24 9 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



Blast Result 



Page 2 of 2 



Query 251 SLSSVLDHPFMSRNPSP 267 

++ +L+ F + P 
Sbjct 250 TIHELLNDEFFTSGYIP 265 



0.09 user sees. 0.03 sys . sees 0.12 total sees. 



http://www.ncbi.nlm.nih.gov/BLAST/bl2seq/wblast2.cgi70 



10/1/2007 



